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Abstract. In |Spi| , we developed a category of databases in which the schema 
of a database is represented as a simplicial set. Each simplex corresponds to 
a table in the database. There, our main concern was to find a categorical 
formulation of databases; the simplicial nature of the schemas was to some 
degree unexpected and unexploited. 

In the present note, we show how to use this geometric formulation effec- 
tively on a computer. If we think of each simplex as a polygonal tile, we can 
imagine assembling custom databases by mixing and matching tiles. Queries 
on this database can be performed by drawing paths through the resulting 
tile formations, selecting records at the start-point of this path and retrieving 
corresponding records at its end-point. 
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1. Introduction 

The distinguishing feature of the simplicial model for databases (see |Spi[ ) is that 
the schemas are simplicial sets. In other words, the organization of the data can be 
drawn as a picture consisting of vertices, edges, triangles, etc. The purpose of this 
short note is to explain how the geometric aspect of such a schema can be directly 
useful for navigating data and manipulating tables. 

There are two main applications we wish to emphasize at this time. The first is 
the ability to add "tiles" to an existing database to create a new one. These new 
tiles (which are given as simplices) may come from internal or external sources. For 
example, if a database has one section that involves people, and one section that 
involves US states, it might benefit from importing from an outside source a tile 
(1-simplex) that connects social security numbers to states of residence. 
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The second is the ability to draw curves in a schema that indicate by what process 
one wishes to use data of one type to find corresponding data of another type. As a 
heuristic example, imagine that one enters odometer readings at a location A, and 
draws a curve through a map beginning at A and ending at B. The system can 
be instructed to output a set of numbers corresponding to the expected readings of 
said odometers given travel along said route from A to B. 

In the following sections, we explain how to visualize simplicial databases, how 
to add tiles to create custom schemas, and how to use paths through a schema to 
indicate table manipulations. 

While simplicial databases are purely mathematical objects, they can be visual- 
ized using a mathematical functor called "geometric realization." Throughout this 
paper, we imagine such a visualization, implemented on a modern computer. More- 
over, this implementation would allow the user to indicate aspects of the database 
using the computers mouse - thus we may refer to clicking on simpliccs, dragging 
tiles, or drawing curves through a schema. 

2. Visualizing simplicial databases 

A simplicial database consists of a schema X together with a sheaf of data Ox ', 
this is explained in |Spi| . In the present paper, we take the schema A to be a 
symmetric semi-simplicial set, but we sometimes abuse terminology and call it a 
"simplicial set" for short. As such, it can be drawn on a user's screen by connecting 
together dots (O-simplices), edges (1-simplices), triangles (2-simpliccs), tetrahedra 
(3-simplices), and higher-dimensional tetrahedra (n-simplices, n > 3). 

Simpliccs can only be connected along a common subsimplex. For example, one 
cannot attach two triangles together along their spacious interior, or along part 
of some side - only along vertices or edges. We associate to every simplex its set 
of attributes (vertex labels), and we can only glue two simplices together along a 
subsimplex if the labels match up in the obvious way. 

The different simplices of a schema X correspond to different tables in the data- 
base. An n-simplex corresponds to a table with n + 1 columns (one for each vertex), 
and with attributes specified by the labels of the vertices. For example, a 2-simplex 
may correspond to a table with attributes "First name," "Last name," "SSN." The 
schema X specifics how to connect these tables together. For example, two triangles 
can be connected along a common edge or just a common vertex. 

Wc do not know how best to represent higher-dimensional simpliccs on a com- 
puter screen, so in the following discussion, we shall imagine that every simplex has 
dimension at most 3. We might imagine a 3-simplex as rotating (confined only by 
its attachments to the rest of the schema). Perhaps higher-dimensional simpliccs 
can be drawn simply as polygons or complete graphs. 

The schema represents the table types, but does not "have data in it." Instead, 
the user should click on a simplex in the schema to see the corresponding table. 
The separation between the schema and the data is represented mathematically as 
the separation between the simplicial set X and the sheaf of data Ox ■ 

Sometimes the data on a schema is only "virtual" in the following sense. Suppose 
we want to consider the operation of adding two integers together. This can be 
represented as a virtual table with three columns, each with datatype "integer." 
In any row, the sum of the integers in the first two cells is the integer in the third 
cell. Of course, we would never want to write down this entire table, but we "know 
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it exists" and can use it to compute. In terms of the schema, it appears as a 2- 
simplex, but perhaps when one clicks it, the system displays the addition function 
itself and/or a few examples of it, rather than the entire virtual table. 

3. Adding tiles 

A company may have at its disposal many different sources of factual informa- 
tion, several of which come in the form of tables. The company benefits when users 
have as much facility with these informative units as possible. 

The simplicial model offers the ability to quickly build custom databases by mix- 
ing and matching tiles. A given user's need for information changes from moment 
to moment; he or she benefits from the ablilty to build up a database that will 
be most useful for the task at hand. In the simplicial model, the user can do so 
by selecting from the set of tables to which he or she has access, each of which is 
represented as a triangle, edge, etc. The user drags various tiles down from the 
library to his or her own workspace and connects them together so as to enable 
quick and flexible queries. The idea of a single database for the whole company 
may start to seem rigid and old-fashioned. 

Tiles may be color-coded, hidden from view, available in suites, etc. There is 
plenty of room for innovation here. For example, the color on older tables may fade 
as the data loses its freshness, verified tables may be indicated with a check-mark 
(/), questionable information may be nagged, etc. 

Sometimes, one may wish to get data from an outside source. Such tiles could 
be made available for purchase. It should not be hard to keep track of where each 
tile came from and when it was created. In this sense, we offer a solution to the 
"data provenance" problem. We imagine that companies would visually trademark 
their tiles in an effort to ensure quality and prevent fraud. 

On the opposite end of the spectrum, some tiles may simply be well-known 
mathematical operations such as addition. If a table in X has two columns we wish 
to add together, we can simply attach an "addition tile" to X by connecting its 
"summands" edge to the edge in X representing the two columns in question. In 
Section |4] we shall see how the summation can be performed by drawing a curve 
through the "summands edge" and ending at the "sum" vertex of the addition tile. 

Example 3.1. Here we present an example that theoretically fits into the same mold, 
but is somehow different in that we are gluing a vertex along a vertex. 

Suppose we have a tile A in which one of the vertices is of type "date." We always 
have available the one-row, one-column table "today's date." As a schema, it looks 
like a single vertex. We can drag the "today's date" vertex to the date vertex of 
A and drop it; the result will be a tile with date replaced by "today's date." The 
data over that simplex will be the result of selecting only those rows whose date is 
that of today. 

Example 3.2. Each individual may have his or her own identity in the form of a 
tile. This tile may have attributes such as "First name," "Last Name," etc., but 
as a tabic it (probably) only has one row. Again, this tile may be dragged and 
dropped into an existing schema. 

For example, suppose one has a "birth" tile that includes all the data of the 
person's birth (date, time, location, etc.). This tile could be dropped into a "zodiac" 
schema with these vertices as its "input" (boundary). Doing so has the effect of 
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selecting all the pertinent data about planetary and stellar locations at that time. 
Using methods developed in the next section, we can create a tile with precisely 
this data. Next, one could plug that tile into their choice of "horoscope" schemas 
to return an associated text. 

Example 3.3. Suppose one drags a tile of a given type onto a tile of the same 
shape. If we apply the above ideas to this situation, the result will basically be the 
intersection of the two tables (if the tables have repeated rows, the construction 
will in fact result in the fiber product of the two tables). 

One might instead wish to take the union of the two tables in question. This 
could easily be done, For example, whenever one drags a tile and places it as 
a subtile of an existing one, perhaps the machine could ask whether the user is 
requesting a union of intersection. If it is a union, the machine would ask for 
additional information: do a UNION ALL, a forgetful UNION, or something more 
controlled (as in |Spi[ ). 

4. Drawing paths 

In the introduction, we mentioned that if one has a table of odometer readings 
and a route through a map, he or she should be able to output odometer readings 
expected after taking the indicated journey. This is literally an application of the 
following more general procedure. We give it as Example 14. II 

Suppose that the user has two tables: T-y has attributes A, B, and T-i has at- 
tributes B,C,D. The user wants to query the database in the following way. Given 
a list of data of type A, he wants to select all data in T\ that conform, then use 
them to query table and locate all the corresponding data of type C, D. For 
example, given a last name t, find all the social security numbers that correspond 
to £, and return the set of incomes and withheld incomes associated to each. 

This query might take a SQL expert a few minutes to construct, but with sim- 
plicial databases it's quite easy. Recall that the schema for the above situation 
consists of an edge and a triangle; the edge has vertices labeled A and B, the tri- 
angle has vertices labeled B, C, and D, and the two are attached at B. Once the 
user has chosen a set of data of type A, he simply clicks the schema at the vertex 
A, and drags a curved line through the schema that ends at the 1-simplcx C, D. 
A good implementation of simplicial databases can then return the desired data of 
type C, D. 

Example 4.1. One can consider a map M as the schema for a simplicial database, all 
of whose simplices have dimension or 1. The vertices of M are the intersections of 
roads and the 1-simplices are the road-portions between these intersections. Each 
road-portion has a distance in miles, which we can record as an integer al for this 
discussion. If each road portion is a 1-simplex in the schema, then over each road 
portion we need a table with two columns. We use the table of all pairs of integers 
s, t where t — s = d. 

Now, suppose we have a 1-column table T of odometer-readings. It will be a 
subtable of the 1-column table over any intersection A of the map. One can then 
draw a route through the map from A to B. Using the above techniques, this will 
result in a functor from tables over A to tables over B. Applying this functor to 
T will result in a table over B whose entries are the integers corresponding to the 
new odometer readings given this journey. 
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Example 4.2. Suppose we keep track of all interactions between pairs of compa- 
nies and the date each occurred on. The data consisting of which two companies 
interacted and on what date is a 2-simplex (3 columns). Suppose we also keep 
track of when two companies join forces to create something together. This is also 
represented with a 2-simplex: which two companies are interacting and what they 
create together. 

Given access to these two tiles, a user might decide to attach them along the 
common face. The resulting shape is a rhombus. What is the meaning of drawing a 
curve through this rhombus, beginning at "date" and ending at "common creation"? 
This curve represents a functor from category of sets of dates to the category of 
sets of common creations. 

This functor is computed as follows. Suppose given a date (resp. set of dates). 
Begin by selecting all interactions between companies that occurred on that date 
(resp. set of dates). Some of these interactions will have resulted in "common 
creations." Finish by outputting this set of common creations. 

In other words, this path through the schema represents the query "tell me all 
the common creations that occurred on these dates." 

Example 4.3. Suppose we have a 1-simplcx tile in which we have a list of friendship 
pairs. In other words, over each vertex is a set of people, and over the 1-simplex 
are those pairs of people that are friends. 

We can grab one vertex of the 1-simplex and drop it onto the other vertex, 
creating a loop. This will have the effect of intersecting the set of people represented 
by the first vertex with the set of people represented by the second vertex. It will 
also "throw away" friendships if their participants are not in both sets of people. 
(In general, we are actually taking a fiber product, not an intersection. However, 
if there are no duplicates, this fiber product is nothing more than an intersection.) 

We can draw a curve through this new loop, say beginning at the vertex, going 
around the loop three times, and ending at the vertex. Given a list of people L 
at the first vertex, the above curve results in the query "name everyone who is a 
friend of a friend of a friend of a person in list X." 

Different paths through a schema may result in different queries, and this may 
be a good thing. However, sometimes one wants to know which pairs of paths 
arc guaranteed to return equal queries. It should not be a difficult problem to 
determine when that is the case and what kinds of constraints can control for it. 
This is a kind of homotopy problem, and may have topological solutions. These 
solutions will be dependent on the sheaf of data Ox, not just on the shape of the 
schema X. 



Appendix A. Technical details 

In this section we give the technical details of the above constructions. None 
of them are hard; the innovation here is in noticing their existence not in the 
mathematics underlying them. In the first subsection below, we give a very brief 
overview of simplicial sets (or more precisely, symmetric semi-simplicial sets). In 
the next two subsections we explicate the category theory behind the concept of 
adding tiles and that of drawing curves to represent queries. 
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A.l. Visualizing simplicial databases. Let Fin m denote the category of finite 
non-empty sets and the monomorphisms between them. The category of semi- 
simplicial sets is S := Pre(Fin m ), so that an object in S is a functor X : Fin m op — > 
Sets and a morphism of semi-simplicial sets is a natural transformation. 

As with any presheaf category, every object X £ Ob (5) has a category of sim- 
plices, which we denote E1(A). Its objects are the simplices of X and its morphisms 
are the inclusions of simplices. It can be described precisely as the opposite of the 
Grothcndieck construction, E1(A) = Gr(A) op . Recall that the Grothendieck con- 
struction of X is the category whose objects are pairs (A, x) where A £ Fin m is 
a finite non-empty set and x £ X(A) is an element in the set X{A). A morphism 
(A,x) — > (A',x') in Gr(X) is a monomorphism /: A' — > A such that f(x) = x'. 

One can visualize an object X £ S as the union of its simplices. In other words, 
there is a functor E1(A) — > Pre(Fin m ) called the diagram of simplices of X and 
we shall soon see that X is the colimit of its diagram of simplices. We will now 
explain this idea. There is a Yoneda functor y: Fin m — s- S, sending A to yA := 
HomFin m (— ! ^4), which is fully faithful. Given an object (X : Fin m op — s- Sets) £ S 
and a finite set A, one has X(A) = Homs(yA, X). 

If one considers every object in iS to be a "formal union of objects in Fin m " , 
then the Yoneda imbedding realizes each object of Fin m as the union of just itself. 
In general, any object X £ S is the colimit of its diagram of simplices: 

(1) X = colim yA. 

yA^X 

One can visualize the Yoneda image of any finite non-empty set A as the "polyg- 
onal hull" of A as a set of vertices: a one-point set is seen as a vertex, a two-point 
set is seen as a line-segment, a three-point set is seen as a triangle, a 4-point set is 
seen a tetrahedron, etc. The isomorphism [T] displays X as the union of these basic 
shapes. 

This visualization can be made precise in the following sense. There is a "geo- 
metric realization" functor 5ft: S — » Top, where Top is the category of topological 
spaces. It is constructed as follows. Any finite non-empty set is isomorphic to 
{1,2,..., n} for some n £ Z>i. The n-simplex A" can be realized in Top as the 
subspace of M™ +1 whose points are 

A" := $l(yA) := {(&„, x u . . . , x n ) £ M'^ 1 \x + x x + ■ ■ ■ + x n = 1} 

Morphisms of finite sets can be linearly extended to continuous functions between 
these spaces. To complete the construction, given any X £ S, we set 

5ft(A) = colim 3t(yA), 

a formal union of topological simplices. 

We can alter the above without making a significant difference. Instead of looking 
at Fin m , we consider the comma category (Fin m 4- DT) for a given set DT of data 
types. We set Sm = Pre(Fin m J, DT). The objects of 5dt are simplices as above, 
except that every vertex is labeled by a data type. A morphism of Sm is also as 
above, with the added requirement that vertices are sent to vertices of the same 
DT-label. 

A simplicial database X is more than just a simplicial set X with labeled vertices 
- it also has a sheaf of data Ox- Each simplex x £ X(a) represents a table Ox (x) 
whose attributes arc cr(A) C DT. All this is made more precise in |Spi|. 
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A. 2. Adding tiles. Certain joins of databases can be done by simply "dragging 
and dropping tiles." In other words, given two tiles (<7i,ti) and {a 2, T2) that share 
a common face (a, r) , we can connect them together without additional thought as 
follows. 

For any simplex a: A — > DT, there is a universal table on it, which in the 
parlance of |Spi| is r(er). The limit of the diagram 

(<7i,Ti) -> (a,T(a)) <- (cr 2 ,r 2 ) 

can be taken in the category of simplicial databases DB, and the result is two 
simpliccs connected along A. 

The above construction can always be done. Sometimes, instead of using the 
universal table on A, one may opt for a more controlled table on A. This is detailed 
in |Spi| . In terms of visualization, one may imagine that the tiles have additional 
markings on their A-faces to indicate compatibility. These markings in fact indicate 
the specific table along which we are joining. 

A. 3. Drawing paths. Drawing a continuous path through a picture of a (symmet- 
ric semi-) simplicial set X can be considered a zig-zag of morphisms in the category 
Gr(X). The starting point of the curve is an object in Gr(X). As one draws the 
curve, it passes from a simplex to one of its faces or vice- versa. Each time it moves 
from a simplex to one of its faces, there is a corresponding morphism in Gr(X). 
Each time it moves from a face to a bigger simplex, there is a morphism "going the 
opposite way" in Gr(A). Thus this path results in a zigzag of arrows in Gr(X). 

For example, consider two triangles ABC, CDE attached along a vertex. We 
begin at the side AB, run through the interior of the first triangle, exit through 
vertex C and travel along CD until we get to D. This gives the zigzag 

"AS" <- "ABC" -> "C" «- "CD" -> ••/).'• 

Thus, to explain the construction we discussed in Section |4j we need to give 
functors F: Gr(X) — > Cat and G: Gr(A) op -> Cat that agree on objects. These 
are developed in |Spi| , but we need to explain here. 

Given a simplex a G Gr(A), assign F(p) = G(a) = Tables^. If i: a C a' is 
a subsimplex, we need functors F(i): Tables^/ — s- Tables^ and G(i): Tables CT — > 
Tables^'. These are the pullback and push-forward functors F(i) :— i* and G(i) := 
i+. 

To be explicit, suppose we have i: a C 0' as above and a table K' — > T(a'). 
The functor F(i) sends r' to the bottom row in the commutative diagram 

r-^r((/) 



K'— T*Ti?)- 

1 or 

The functor F(i) acts similarly on morphisms of tables: simply replace both in- 
stances of K' above with a morphism K[ —> K' 2 . 
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The functor G(i) sends a table K A r(cr) to the bottom row in the pullback 
diagram 

k — - r(<r) 



K x v{tT) T{(j') —^T^') 

% T 

The functor G(i) acts similarly on morphisms of tables: 

Kx »- K 2 - ^ T(a) 



K 2 X r(CT) r(cr') 



r(a') 



Ky x r(CT) Tip 1 ) - 

Given F and G, we can now construct a functor from the category of zigzags in 
Gr(X) to the category of categories, as above. It is this construction that we use 
to query the database. Given a path through the schema, we get a zigzag in Gr(X) 
as above, then apply F and G to get a functor from tables over the start-point to 
tables over the end-point. 

We should record one more important point. As we have been saying, a path 
through X induces a zigzag of tables 

(o-i,Ti) -> (02, r 2 ) <- (03, r 3 ) ->• (cr 4 ,r 4 ) <- (cr 5 ,r 5 ) 

in the form of projects and selects. Write T\ : K\ — ^ r(ci), and suppose that 
_ftT( — > is a table over T\ . Applying the above constructions we get a diagram 



K' 2 : = 



K 2 



(K' 2 l 3 K2 K 3 ) 



K* 



K' 4 : = 

K' 3 ■ 



K A 



(K' 4 x Ki K 5 ) 



r(o- a ) 



rfo) 



r(<7 4 ) 



r(a 5 ) 



r(ax)- 

in which K 2 , . ■ ■ , i^s are induced either as equalling the previous, or by fiber prod- 
uct. What we have not yet said is that from this, we can construct a function 
K' 5 — > K[. Such a morphism will exist for any zigzag in Gr(X). 

Finally, we have maps K' 5 — > r(crs) and K' 5 — > K[ — > r(ui), which induce a 
single map K' 5 — > T(ai) x T(a^). This map gives a table with rows ai II (T5 and is 
the graph of the functor given by the zigzag applied to our starting table t\ . 
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